The Weight of Phonetic Substance in the Structure of Sound Inventories
نویسندگان
چکیده
In the research field initiated by Lindblom & Liljencrants in 1972, we illustrate the possibility of giving substance to phonology, predicting the structure of phonological systems with nonphonological principles, be they listener-oriented (perceptual contrast and stability) or speaker-oriented (articulatory contrast and economy). We proposed for vowel systems the Dispersion-Focalisation Theory (Schwartz et al., 1997b). With the DFT, we can predict vowel systems using two competing perceptual constraints weighted with two parameters, respectively λ and α. The first one aims at increasing auditory distances between vowel spectra (dispersion), the second one aims at increasing the perceptual salience of each spectrum through formant proximities (focalisation). We also introduced new variants based on research in physics namely, phase space (λ,α) and polymorphism of a given phase, or superstructures in phonological organisations (Vallée et al., 1999) which allow us to generate 85.6% of 342 UPSID systems from 3to 7-vowel qualities. No similar theory for consonants seems to exist yet. Therefore we present in detail a typology of consonants, and then suggest ways to explain plosive vs. fricative and voiceless vs. voiced consonants predominances by i) comparing them with language acquisition data at the babbling stage and looking at the capacity to acquire relatively different linguistic systems in relation with the main degrees of freedom of the articulators; ii) showing that the places “preferred” for each manner are at least partly conditioned by the morphological constraints that facilitate or complicate, make possible or impossible the needed articulatory gestures, e.g. the complexity of the articulatory control for voicing and the aerodynamics of fricatives. A rather strict coordination between the glottis and the oral constriction is needed to produce acceptable voiced fricatives (Mawass et al., 2000). We determine that the region where the combinations of Ag (glottal area) and Ac (constriction area) values results in a balance between the voice and noise components is indeed very narrow. We thus demonstrate that some of the main tendencies in the phonological vowel and consonant structures of the world’s languages can be explained partly by sensorimotor constraints, and argue that actually phonology can take part in a theory of Perception-for-Action-Control. 1 Phonology in a substance-based linguistics Speech communication operates on two highly-structured levels, the system itself and its physical realisation. This is probably the reason why speech communication is so efficient compared to other communication means used by man or animal. The terms language and speech refer to these two levels, separated by Saussurean structural linguistics in form and substance, and reconsidered by generative grammar under the terms competence and performance. Throughout the 20th century, several axioms of the core of structuralist, and subsequently generativist, approaches have conditioned relationship between phonetics and linguistics: • the language/speech dichotomy; • the independence of these two concepts; • the primacy of language over speech. Vallée, Boë, Schwartz, Badin & Abry 146 This distinction is the result of a particular methodological approach. Linguistics, in order to make empirical data intelligible, separate the study of a sound system – its field of research – from the issue of its physical realisation, which may be variable and polymorphous (Ducrot & Schaeffer, 1995, p. 245). These methodological principles, cumulative in their effects, marginalised any attempt to reveal interactions between the major tendencies observed in the phonological systems of the world’s languages – their universals – and the articulatory and acoustic characteristics of their physical realisation. They isolated linguistics in a reductionist internalism and influenced the presuppositions which founded phonology. According to these principles, phonological units cannot be defined by subtantial properties but only with respect to their relative position within the system, and the question of their universality no longer arises. On the threshold of the 21st century, a number of approaches in contemporary linguistics and phonology are still characterised by a strong internalist approach, often presented as an advantage, and goes as far as outright refusal to take into account hypotheses, models, and results obtained by connected disciplines which have language and speech in their field of research. This rejection – to consider the evidence of relationship between form and substance – reiterated throughout this century, is perhaps a unique example in the history of the twentieth-century science: testing data and models, whatever their provenance, should form an intrinsic part of any scientific approach. In 1952, Jakobson, Fant, and Halle introduced, in Preliminaries to Speech Analysis (PSA), a new conception of phonology in linking phonemic features to acoustic correlates and speech perception. Even if their proposed features – which were too general and poorly quantified – have not really clarified the relation between form and substance, the relationship between phonology and phonetics interrupted for almost two decades was discussed anew. Generative phonology retained from PSA the idea of a universal system of binary features. In The Sound Pattern of English (SPE) in 1968, Chomsky and Halle replaced the traditional acoustico-perceptual specification by a universal phonetic representation, expressed in terms of more numerous articulatory features which were precisely defined and well-documented. This can be considered as a very important advance in the framework of phonetic description and relationship between form and substance. It might have been expected that generativist phonologists could have connected their work to articulatory measurements. In fact, the descriptive prolegomena of SPE have remained unfollowed, and the proposed features have not been used in phonological descriptions as a part of a pure symbolic formalism, as the authors themselves stated (Chomsky & Halle, 1968, p. 274). To compensate the lack of naturalism, Chomsky and Halle reintroduced the Theory of Markedness (inherited from the phonology of Trubetzkoy). More recently, the Optimality Theory was proposed to preserve those universal constraints which reveal the unity of language. For Prince and Smolensky (1993), the universal grammar can be essentially considered as a set of ordered constraints, often conflicting, which regulate the wellformedness of representations from which individual grammars are constructed. These constraints are always active, and languages are then distinguished by the way in which conflicts are resolved. Once again, however, it is necessary to proceed with criteria which do not reduce the reasoning to a straightforward tautology. In fact, the Saussurean dogma taken up by Chomsky (1965): “The classical Saussurean hypothesis of the logical priority of the study of language (and of the generative grammar that describes it) seems almost incontrovertible” has not truly been called into question, except in 1972, with a new perspective that paved the way towards a whole new sweep of research called “substance-oriented” linguistics brought out by The Maximal Dispersion Theory (Liljencrants & Lindblom, 1972) and The Quantal Theory (Stevens, 1972). We place ourselves in these two approaches initiated in the seventies, using recent work on The Weight of Phonetic Substance in the Structure of Sound Inventories 147 the typology of sound structures (Vallée, 1994 ; Schwartz & al. 1997a ; Stefanuto & Vallée, 1999), and trying to show that some of the tendencies refer to biological constraints on speech production and speech perception human systems, that is, to the substance, not to the form. Our aim is not to refute the existence of a formal phonological level with its intrinsic formal principles and rules, but to try to determine, and if possible quantitatively model, a set of constraints coming from the speech substance and capable of having played a part in the emergence of this formal system – and therefore to throw some lights on phonological facts which could sometimes appear arbitrarily. Since Lindblom, a number of elements are now available to integrate phonology in a substance-based theory called the Perception-for-Action-Control Theory (Schwartz et al., 2002): “this theory should be able to show how the choice of speech units inside the phonological system may be constrained and patterned by the inherent limitations and intrinsic properties of the speech perception system – and its indissociable companion, the speech production system”. The core of the proposal is that “a listener might follow the vocalisations of his speaking partner, in order perhaps to understand them, but at least certainly to imitate and learn: in other words, perception enables a listener to specify the control of his future actions as a speaker. On the other hand, [...] the perceptual representations of speech gestures transform, deform, shape the speaker's gestures in the listener's mind, and hence provide templates that in return also help to specify the control of the speaking partner's own actions.” This approach is centred on the co-structuring of the perception and action systems in relation with phonology. However, the Perception-for-Action-Control Theory does not fall within the framework of both “an "auditory" theory in which the sensory-interpretative chain is considered independently of the patterning of sounds by speech gestures, in the search of some "direct link" between sounds and phonemes; and from a "motor" theory [...] in which perception is nothing but a mirror of action, in the claim of a direct link between sounds and gestures.” The studies presented here are at the core of the relationship between phonology and perception for action control. We attempt to show that phonemes, vowels and consonants, are not obviously arbitrary phonological units: phonological systems are in part co-structured by speech perception and action. Considering that phonology should contain the set of formal structures characterising conscious mechanisms for speech control, it is only logical to assume that it is not independent of the ability of the speech production system to produce gestures, and of the speech perception system to recover and shape these gestures. This is the basis both of a theory we have developed for dealing with oral vowel systems, in the line of Lindblom's Dispersion Theory: The Dispersion-Focalisation Theory (DFT) presented in the following section, and of a set of suggested ways to explain plosive vs. fricative and voiceless vs. voiced consonants predominances then developed from our typological analysis based on UPSID phonological systems. In fact, we adopt an epistemological framework using “external” data to phonological description: speech production and speech perception constraints to which it is possible to add some data on ontogenesis and language disturbances (cf. MacNeilage, 1998). Following this approach initiated by Lindblom, models currently permit the prediction of the main tendencies observed in sound systems. It is thus possible to take a close look at the problem of phonological structures and their changes systematically, to establish a precise diagnosis of what can be attributed to speech production/perception, and to list the questions which must be addressed to linguistics and sociolinguistics instead. With such an approach, we do not fall into the trap of the weaknesses of an inductive approach which consists in inferring general laws from isolated observations and can lead us to the error of presupposing the conclusion. We finally illustrate and discuss the inescapable fact Vallée, Boë, Schwartz, Badin & Abry 148 that the relationship between phonology and phonetics has to constitute a research field of linguistic sciences. 2 The weight of phonetic substance in vowel structures 2.1 Prediction of the phonological structure of vowel systems: the DFT 2.1.1 General principles Since the beginning of the 70s, several proposals have been made to predict the phonological structure of vowel systems with non-phonological principles, be they listener-oriented (perceptual contrast and stability) or speaker-oriented (articulatory contrast and economy). The so-called “sufficient perceptual contrast” theory (Lindblom, 1986) provides the best global fit with phonological data. However, to overcome its two main problems (that is, the excessive number of high non-peripheral vowels in the model predictions and the impossibility to predict the [ ] series within the high vowel set), we proposed at ICP a theory based on two principles, that is dispersion and focalisation. These principles specify two basic properties that vowel gestures should have in order to provide a viable sound system for communication. Firstly, gestures should provide sufficiently different acoustic patterns to allow the perception system to be able to recover them without confusions or ambiguities: this is dispersion. Secondly, they should provide salient spectral patterns (formant convergence in vowel spectra), easy to process and characterise in the ear: that is focalisation. While auditory dispersion is a classical concept, focalisation is a principle introduced by ourselves (Schwartz & Escudier, 1987, 1989). The Dispersion-Focalisation Theory (DFT) (Schwartz et al., 1997b) allows us to predict vowel systems through a competition between two perceptual costs: for a given number of vowels, the most frequent system in the world’s languages is supposed to be obtained by minimising a global criterion combining a structural dispersion cost and a local focalisation cost. 2.1.2 Implementation Each vowel is characterised by the formants of its spectrum, that is F1, F2, F3 and F4, expressed in a perceptual Bark scale. The (F2, F3, F4) set allows to compute an integrated “effective perceptual formant” F'2. In the Dispersion-Focalisation Theory (DFT), we define a vowel system by a set of vowels in the maximum available formant space and we associate to each system an energy function consisting of the sum of two costs, namely a structural dispersion cost based on inter-vowel perceptual distances – computed through an Euclidean distance in the (F1, F'2) space, and favouring large inter-vowel distances – and a local focalisation cost based on intra-vowel perceptual salience, which aims at providing perceptual preference to vowels showing a convergence between two formants, that is, vowels with close F1 and F2, F2 and F3, or F3 and F4. The model is controlled by two parameters: λ specifying the weight of F'2 in respect to F1 in the dispersion cost, and α specifying the respective weight of the focalisation cost relative to the dispersion cost. Then, for a fixed number of vowels in a system, we implemented various algorithms to select optimal systems, that is systems with the lowest energy (the best compromise of dispersion and focalisation), either locally (“stable systems”) or globally (“best systems”) (Schwartz et al., 1997b). Our predictions of optimal vowel systems were then systematically compared to vowel inventories, according to the UCLA UPSID Database (Maddieson, 1984 ; Maddieson & Precoda 1989). The Weight of Phonetic Substance in the Structure of Sound Inventories 149 2.1.3 Phase spaces For a given number of vowels, from 3 to 9 (beyond this limit, vowel systems introduce a new dimension, mainly nasality and less often quantity, Vallée, 1994), we can predict, in the DFT framework, different vowel systems in the (λ, α) space. This leads to the determination of what we call “the phase space”, a well-known procedure in thermodynamics used to predict the states of a substance (such as the states of water: steam, liquid and ice), as a function of pressure and temperature. The general trend is that, for a given number of vowels in a system, decreasing λ favours peripheral systems while increasing it favours systems with one and then two high non-peripheral vowels; and increasing α favours focal vowels, and particularly stabilises [ ] within an [ ] high series, while this series is unstable when α is set to 0. Previous work allowed us to verify that these predictions were more or less compatible with the observed preferred phonological vowel systems in the UPSID317 database (Maddieson, 1984). Considering that peripheral systems are generally preferred from 3 to 7 vowels and that the [ ] series of high vowels exists in a significant amount of cases in the database (about 5% of the cases in the whole database, and 13% of the cases for systems with 7 vowels or more), we showed that setting the λ value around 0.2 0.3 and the α value around 0.3 0.4 led to quite acceptable predictions (Schwartz et al., 1997b). In the present work, we try to go one step further: we shall attempt to determine where in the phase spaces one can find the different systems, preferred or not, existing in UPSID451, and what kinds of “superstructures” can be derived from this analysis. 2.1.4 Structural symmetries between vowel systems: a typological equivalence criterion 2.1.4.1 Prototypical structures in phase spaces Our previous simulations led to “prototypical systems”. These are winning n-vowels systems in the DFT framework, in the sense that they have a minimal global Dispersion-Focalisation (DF) energy, according to the values of the two free parameters λ and α. We have focused our study on values of n from 3 to 7 because they allow us to capture the most significant phonological tendencies of the UPSID database. The DFT simulation results are given in Figures 2-6, respectively for n = 3, 4, 5, 6 and 7. For each value of n, the phase space determines regions in the (λ, α) space in which a given system wins (with its vowel qualities displayed as black points on a prototypical grid). We see that there are two prototypical systems for n = 3, which we call S3T1 and S3T2. There are four prototypical systems for n = 4, 5, 6, and five prototypical systems for n = 7; let us call them SnTi , with n from 3 to 7, and i from 1 to 5. The global trend is that increasing n increases the dispersion cost of peripheral systems, hence it decreases the λ boundary necessary for making these systems optimal. Hence peripheral systems are favoured with small values of λ. When λ is too small, the vowel space is completely vertically stretched (since higher formants play a minimal part in the determination of vowel phonetic quality); this favours asymmetrical peripheral configurations because of the interactions between front and back peripheral vowels in the systems. Non-peripheral configurations, that is systems with more than two high vowels, appear with large λ values, and when α increases, focal vowels (especially [ ] with close F3 and F4, other front unrounded vowels together with [ ], all with close F2 and F3, and back rounded vowels, with close F1 and F2) are favoured. Decreasing α leads to replacement of the high rounded vowel [ ] with a high vowel acoustically more central (namely [ ] or [ ]). Vallée, Boë, Schwartz, Badin & Abry 150 2.1.4.2 Reverse prototypical structures We hypothesised that two structures having the same number of peripheral vowels but systematically replacing front unrounded vowels by back rounded ones with the same height, and vice-versa, are equivalent structures in the sense of DFT, that is to say that they have roughly the same DF energy for a given value of n and of the (λ, α) pair. This was systematically verified by comparing the energy of the SnTi prototypical systems with reverse systems that we called SnTi*. For example, for n = 4 we compared S4T1 = [ ] with S4T1* = [ ], and S4T2 = [ ] with S4T2* = [ ], S4T3 = [ ] and S4T4 = [ ] having no reverse counterpart. Indeed, we confirmed that SnTi* structures have a DF energy quite close to the SnTi ones whatever the region of the phase space, that is to say whatever the λ and α values. Pushing the analogy with physics one step further, this reminds us of the “polymorphism” of a number of solids (e.g. metals, or crystals). In this situation, while fusion produces a homogeneous liquid phase, solidification leads to mixtures of two or more variants of the solid phase, all stable and more or less with the same energy. This is exactly what happens here with the two variants within a given phase. Hence, our typologies of phase spaces involve “superstructures” grouping prototypical structures and reverse ones (displayed with white points instead of black ones in Figures 2-6). The relevance of these superstructures for describing the UPSID database will now be discussed in the next section. 2.1.5 Comparing UPSID data with DFT simulations 2.1.5.1 UPSID data reanalysed Since Trubetzkoy and his Principles (1939), taxonomy has not only been an approach of historical linguistics: associated to research on synchronic trends, it constitutes today a main stage in linguistic theories. Institutionalised in 1961, under the aegis of the Social Science Research Council during the New-York Conference on Language Universals, this research field aims at finding common basic structures in languages – in diachrony as well as in synchrony. The Language Universals Project (1967-1976) led to the building up of the Stanford Phonology Archives (Greenberg et al., 1978), with which many important studies dealing with typological classification and phonological tendencies, were achieved (Sedlak, 1969; Crothers, 1978; Maddieson, 1984; Vallée, 1994). But all these studies present variegated contents: data are constantly enriched, questions on the materials vary from one author to the other. The UPSID (UCLA Phonological Segment Inventory Database) (317 then 451 languages) gathers phonological systems of languages in the world, sampling more or less uniformly all linguistic families. UPSID317 (Maddieson, 1984), then UPSID451 (Maddieson & Precoda, 1989) were chosen to approximate a properly constructed quota sample on a genetic database of the world’s existing languages. UPSID was implemented at ICP several years ago and we have been using it for vowel and consonant research. In order to test our hypotheses, we have reanalysed the UPSID database of vowel systems, thanks to a two-step methodology. The languages in UPSID have 3 to 28 vowels. Firstly, from raw data, that is to say without any typological equivalencies, we obtain 252 types of phonological structures from 3 to 17 vowel qualities. What we call vowel qualities corresponds to “basic segments” (vs. “elaborated” and “complex” segments) in the sense of Lindblom & Maddieson (1988). We note that more than 96% of the languages have from 3 to 10 basic vowel qualities, and if we focus our study on systems with 3 to 7 qualities, we obtain 77% of the 451 languages (348 systems). This is due to the fact that there are in many cases more vowels than vowel qualities The Weight of Phonetic Substance in the Structure of Sound Inventories 151 in a given system; for instance is the phonological structure of four UPSID languages of which three have more than 5 vowels: Chipewyan with 14 vowels / , Siriono (12) / /, Tamang (10) / i e a o u /. The systems with nasal, laryngeal, pharyngeal or retroflexed vowels sharing no vowel qualities with a basic segment, as opposed to the systems quoted above, have been discarded for follow-up analyses. These results in eliminating less than 3% of UPSID’s languages and 3.4% of languages having from 3 to 7 vowels, that is seven languages with 6 vowels qualities and five languages with 7 qualities, for instance the Cherokee system / / or the Tarascan system /i /. At this stage we retain 336 systems of the database. Secondly, we take into account the so-called “transparency rule” (Schwartz et al., 1997a). This rule states that schwa should be conceived as a separate class, considering that it does not seem to interfere with the other vowels in a system: indeed, schwa added or removed from a system does not disrupt the structural organisation of this system. The “transparency rule” concerns 64 languages from 4 to 8 vowel qualities. For instance we have classed the Ivatan structure / / as S3T2, Achumawi / / as S5T2, Ndut / / as S6T1, and Fur / / as S7T2. The “transparency rule” results in slightly increasing the number of systems in the analysis, thanks to six 8-vowels systems which become 7-vowels ones. Hence at this last stage we stay with 342 systems that is to say 75.8% of the database (Figure 1). Figure 1: UPSID’s languages distribution by number of basic vowel qualities. We focus our study on systems from 3 to 7 qualities. 0 20 40 60 80 100 120 140 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 number of vowel qualities nu m be r of la ng ua ge s
منابع مشابه
Effects of Language Experience on Organization of Vowel Sounds
There is a long tradition in laboratory phonetics of describing the interaction of phonetic inventories and constraints of auditory perception and articulatory production. For example, Liljencrants & Lindblom (1972) attempted to predict the inventories of typical vowel systems by appealing to the notion of maximal perceptual distinctiveness. Stevens (1989) explained the makeup of these inventor...
متن کاملAn evolutionary model of phonological inventories based on synchronic data
The sound systems of the world’s languages exhibit properties that fit the framework of complex adaptive systems well. Indeed, the structure of inventories is often explained in terms of constraints weighting on phonological units and on their interactions (Liljencrants & Lindblom, 1972 ; Lindblom et al., 1983 ; Lindblom & Maddieson, 1988). In this paper, we present an evolutionary model of pho...
متن کاملAnalysis of the therapeutic progress of children with phonological disorders after the application of the Multiple Oppositions Approach.
The Multiple Oppositions Approach is described as an alternative model for the treatment of children with the severe phonological disorders. The aim of this study was to analyze the therapeutic progress of five children with phonological disorder, submitted to the Multiple Oppositions Approach, regarding the phonetic (sounds) and phonological (phonemes and altered distinctive features) inventor...
متن کاملA Study on News Anchors’ Meta-Language and Non-Verbal Factors and their Impact on Audiences
Non-verbal communication or body messaging occurs when facial expressions, tone of voice, head and neck movements, smiling and ... affects others; which may be intentional or unintentional. Farhangi in nonverbal communication: the art of using movement and sound” defines this field as such: "Non-verbal communication is phonetic and non-phonetic messages which have been explained by other than l...
متن کاملEnhancing stimulability: a treatment program.
Recent research on stimulability and generalization suggests that treatment of nonstimulable sounds results in maximum treatment gains (Powell, Elbert, & Dinnsen, 1991). It has also been suggested, however, that nonstimulable sounds are more difficult to teach, especially to young children with very small phonetic inventories. In this article, we describe a treatment program designed to increas...
متن کاملمعرفی شبکه های عصبی پیمانه ای عمیق با ساختار فضایی-زمانی دوگانه جهت بهبود بازشناسی گفتار پیوسته فارسی
In this article, growable deep modular neural networks for continuous speech recognition are introduced. These networks can be grown to implement the spatio-temporal information of the frame sequences at their input layer as well as their labels at the output layer at the same time. The trained neural network with such double spatio-temporal association structure can learn the phonetic sequence...
متن کامل